per chi vuole provare a simulare le cose in tempo reale
qr code che manda a questo link https://github.com/sitalaura/link-functions/tree/main/R
oppure scaricare il file a questo percorso sitalaura.github.io/link-functions/R/datasim.R
independent variable: age in years (years)
dependent variable: (variabile)
aggiungi screenshot dataset
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
aggiungi screenshot dataset
using the classical linear predictor
fitL <- glm(accuracy ~ age, data = d)
effL <- data.frame(
allEffects(
fitL,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age"]]
)
ggplot(d, aes(x = age, y = accuracy)) +
coord_cartesian(ylim = c(0, 1)) +
geom_point(size = 4, alpha = .5, color = "darkblue") +
geom_ribbon(
data = effL,
aes(x = age, ymin = lower, ymax = upper),
alpha = .3, fill = "darkred", color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effL,
aes(x = age, y = fit),
size = 2, color = "darkred",
inherit.aes = FALSE
) +
theme(text = element_text(size = ts, color = "black")) +
scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
scale_y_continuous(breaks = seq(0, 1, .1)) +
ylab("accuracy") + xlab("Age (years)")questo modello ci aiuta a predire i dati?
no perché a 11 anni i bambini hanno accuratezza del 110%
GRAFICO
effettivamente succede a 11 anni: posterior model predicion
GRAFICO A DESTRA DELL’ALTRO
IN THE FIRST EXAMPLE an identity link was appropriate bc
boh) spans from -inf to +infhere an identity link is NOT appropriate bc
accuracy) spans from 0 to 1in this case, link="logit" makes sure that y spans from 0 and 1
effLogit <- data.frame(
allEffects(
fitLogit,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age"]]
)
p1 <- ggplot(d, aes(x = age, y = accuracy)) +
coord_cartesian(ylim = c(0, 1)) +
geom_point(size = 4, alpha = .5, color = "darkblue") +
geom_ribbon(
data = effLogit,
aes(x = age, ymin = lower, ymax = upper),
alpha = .25, fill = "blue", color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effLogit,
aes(x = age, y = fit),
linewidth = 2, color = "blue",
inherit.aes = FALSE
) +
theme(text = element_text(size = ts, color = "black")) +
scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
scale_y_continuous(breaks = seq(0, 1, .1)) +
labs(y = "accuracy", x = "Age (years)")
p1p2 <- ggplot(d, aes(x = age, y = accuracy)) +
coord_cartesian(ylim = c(0, 1)) +
geom_point(size = 4, alpha = .5, color = "darkblue") +
geom_ribbon(
data = effLogit,
aes(x = age, ymin = lower, ymax = upper),
alpha = .25, fill = "blue", color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effLogit,
aes(x = age, y = fit),
linewidth = 2, color = "blue",
inherit.aes = FALSE
) +
geom_ribbon(
data = effL,
aes(x = age, ymin = lower, ymax = upper),
alpha = .3, fill = "darkred", color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effL,
aes(x = age, y = fit),
size = 2, color = "darkred",
inherit.aes = FALSE
) +
theme(text = element_text(size = ts, color = "black")) +
scale_x_continuous(
breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)
) +
scale_y_continuous(
breaks = seq(0, 1, .1)
) +
labs(y = "accuracy", x = "Age (years)")
p1 | p2independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
adding a new main effect
groups: normal kids (group = 0) vs kids with dyslexia (group = 1)
using an linear model family=gaussian(link="identity") a positive interaction emerges
Call:
glm(formula = accuracy ~ age * group, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.948422 0.002505 378.65 <2e-16 ***
age 0.050033 0.002172 23.03 <2e-16 ***
group1 -0.095498 0.003646 -26.19 <2e-16 ***
age:group1 0.067209 0.003119 21.55 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.003312306)
Null deviance: 16.5131 on 999 degrees of freedom
Residual deviance: 3.2991 on 996 degrees of freedom
AIC: -2866.2
Number of Fisher Scoring iterations: 2
d$group <- as.factor(d$group)
eff_obj <- allEffects(
fitLint,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age:group"]]
effL <- as.data.frame(eff_obj)
if (!("fit" %in% names(effL))) {
pred_col <- intersect(c("fit", "response", "fitted", "yhat"), names(effL))
if (length(pred_col) == 0) stop("Nessuna colonna di predizione trovata in effL.")
effL$fit <- effL[[pred_col[1]]]
}
ggplot(d, aes(x = age, y = accuracy, shape = group, color = group)) +
geom_point(size = 4, alpha = .6) +
geom_ribbon(
data = effL,
aes(x = age, ymin = lower, ymax = upper, fill = group, group = group),
alpha = .3,
color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effL,
aes(x = age, y = fit, color = group, group = group),
linewidth = 2,
inherit.aes = FALSE
) +
scale_color_manual(values = c("darkorange3", "darkgreen")) +
scale_fill_manual(values = c("darkorange3", "darkgreen")) +
theme(text = element_text(size = ts, color = tc)) +
scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
scale_y_continuous(breaks = seq(0, 1, .1), limits = c(0, 1)) +
labs(y = "Accuracy", x = "Age (years)")using an linear model family=binomial(link="logit") a negative interaction emerges
fitLogitint = glm(accuracy ~ age*group, data=d, family=binomial(link="logit"),
weights= rep(k, nrow(d)))
summary(fitLogitint)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "logit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.18427 0.06689 62.554 < 2e-16 ***
age 1.67118 0.04836 34.559 < 2e-16 ***
group1 -1.71205 0.07406 -23.118 < 2e-16 ***
age:group1 -0.37428 0.05454 -6.863 6.76e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8867.4 on 999 degrees of freedom
Residual deviance: 1005.6 on 996 degrees of freedom
AIC: 3110.2
Number of Fisher Scoring iterations: 5
fitLogitint <- glm(
accuracy ~ age * group,
data = d,
family = binomial(link = "logit"),
weights = rep(k, nrow(d))
)
effP <- data.frame(
allEffects(
fitLogitint,
xlevels = list(age = seq(min(d$age), max(d$age), .05))
)[["age:group"]]
)
effP$group <- as.factor(effP$group)
ggplot(d, aes(x = age, y = accuracy, shape = group, color = group)) +
geom_point(size = 4, alpha = .6) +
geom_ribbon(
data = effP,
aes(x = age, ymin = lower, ymax = upper, group = group, fill = group),
alpha = .3, color = NA,
inherit.aes = FALSE
) +
geom_line(
data = effP,
aes(x = age, y = fit, group = group, linetype = group, color = group),
linewidth = 2,
inherit.aes = FALSE
) +
scale_color_manual(values = c("darkorange2", "darkgreen")) +
scale_fill_manual(values = c("darkorange2", "darkgreen")) +
theme(text = element_text(size = ts, color = "black")) +
scale_x_continuous(breaks = seq(floor(min(d$age)), ceiling(max(d$age)), .5)) +
scale_y_continuous(breaks = seq(0, 1, .1), limits = c(0, 1)) +
ylab("accuracy") + xlab("Age (years)")non ho simulato un’interazione, quindi ENTRAMBI i modelli trovano un’interazione che non c’è.
let’s try out the multiple alternative forced choice (50% - bc of the true/false) probit link
using an linear model family=binomial(link="mafc.probit") no interaction emerges !!!! as it should be
fitM = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))
summary(fitM)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.90127 0.03396 55.983 <2e-16 ***
age 0.95067 0.02879 33.024 <2e-16 ***
group1 -0.97128 0.03943 -24.631 <2e-16 ***
age:group1 0.06149 0.03622 1.697 0.0896 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8867.40 on 999 degrees of freedom
Residual deviance: 902.75 on 996 degrees of freedom
AIC: 3007.3
Number of Fisher Scoring iterations: 5
equal intervals on X correspond to equal intervals on Y
su x ed y metti i nomi delle variabili dell’esempio
equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
Building a model means that we want to find the processo generativo dei dati which, diversamente dal mondo delle simulazioni, we could never know for sure
to do that we must make important decisions
choosing the more appropriate family of distributions to make sure that the new values of the vd im predicting lie within the bounds
choosing the more appropriate link function: otherwise it’s very likely you end up finding non linear effects (ie interactions) that are not there!
We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction: so far, quite often
All materials are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
Special thanks to
Cognitive Science Arena 2026